[SPARK-1726] [SPARK-2567] Eliminate zombie stages in UI.#1566
[SPARK-1726] [SPARK-2567] Eliminate zombie stages in UI.#1566kayousterhout wants to merge 1 commit intoapache:masterfrom
Conversation
Due to problems with when we update runningStages (in DAGScheduler.scala) and how we decide to send a SparkListenerStageCompleted message to SparkListeners, somtimes stages can be shown as "running" in the UI forever (even after they have failed). This issue can manifest when stages are resubmitted with 0 tasks, or when the DAGScheduler catches non-serializable tasks. The problem also resulted in a (small) memory leak in the DAGScheduler, where stages can stay in runningStages forever. This commit fixes that problem and adds a unit test.
|
QA tests have started for PR 1566. This patch merges cleanly. |
There was a problem hiding this comment.
So just to clarify what's going on here: prior to my change, we added a stage to runningStages here, after calling submitMissingTasks (so after the code I modified below gets executed). This could lead to a memory leak (if the stage needed to be aborted in submitMissingTasks, due to a NotSerializableException for example, because then it would never be removed from runningStages). It also meant that the DAGScheduler sent a SparkListenerStageSubmitted event to the UI, but never a SparkListenerStageCompleted (because, on line 1072, we only send a SparkListenerStageCompleted event if the stage is in runningStages).
|
Makes sense. LGTM |
|
Thanks for the quick review @markhamstra ! |
|
QA results for PR 1566: |
|
Looks good to me too. I've merged this. |
|
BTW I've merged this only into 1.1 because the patch didn't apply cleanly on 1.0. If you think it's important, we can also add it to 1.0.x, but it doesn't seem like that big of a showstopper. |
|
Yeah that seems fine to me -- thanks Matei! |
Due to problems with when we update runningStages (in DAGScheduler.scala) and how we decide to send a SparkListenerStageCompleted message to SparkListeners, sometimes stages can be shown as "running" in the UI forever (even after they have failed). This issue can manifest when stages are resubmitted with 0 tasks, or when the DAGScheduler catches non-serializable tasks. The problem also resulted in a (small) memory leak in the DAGScheduler, where stages can stay in runningStages forever. This commit fixes that problem and adds a unit test. Thanks tsudukim for helping to look into this issue! cc markhamstra rxin Author: Kay Ousterhout <kayousterhout@gmail.com> Closes apache#1566 from kayousterhout/dag_fix and squashes the following commits: 217d74b [Kay Ousterhout] [SPARK-1726] [SPARK-2567] Eliminate zombie stages in UI.
Due to problems with when we update runningStages (in DAGScheduler.scala)
and how we decide to send a SparkListenerStageCompleted message to
SparkListeners, sometimes stages can be shown as "running" in the UI forever
(even after they have failed). This issue can manifest when stages are
resubmitted with 0 tasks, or when the DAGScheduler catches non-serializable
tasks. The problem also resulted in a (small) memory leak in the DAGScheduler,
where stages can stay in runningStages forever. This commit fixes
that problem and adds a unit test.
Thanks @tsudukim for helping to look into this issue!
cc @markhamstra @rxin